“ New paradigms ” in MT : the state of play now that the dust has settled
نویسنده
چکیده
Harold L. SOMERS Centre for Computational Linguistics UMIST, PO Box 88, Manchester M60 1QD, England [email protected] 1. Background In 1988, at the Second TMI conference at Carnegie Mellon University, IBM’s Peter Brown shocked the audience by presenting an approach to Machine Translation (MT) which was quite unlike anything that most of the audience had ever seen or even dreamed of before. IBM’s “purely statistical” approach, inspired by successes in speech processing, and characterised by the infamous statement “Every time I fire a linguist, my system’s performance improves” flew in the face of all the received wisdom about how to do MT at that time, eschewing the rationalist linguistic approach in favour of an empirical corpusbased one. There followed something of a flood of “new” approaches to MT, few as overtly statistical as the IBM approach, but all having in common the use of a corpus of translation examples rather than linguistic rules as a significant component. This apparent difference was often seen as a confrontation, especially for example at the 1992 TMI conference in Montreal, which had the explicit theme “Empiricist vs. Rationalist Methods in MT” (Isabelle, 1992), though already by that date most researchers were developing hybrid solutions using both corpus-based and theorybased techniques. The heat has largely evaporated from the debate, so that now the “new” approaches are considered mainstream, in contrast though not in conflict with the older rule-based approaches. In this paper, we will review the achievements of a range of approaches to corpus-based MT which we will consider variants of “example-based MT” (EBMT), although individual authors have used alternative names, perhaps wanting to bring out some key difference that distinguishes their own approach: “analogy-based”, “memorybased”, and “case-based” are all terms that have been used. These approaches all have in common the use of a corpus or database of already translated examples, and involve a process of matching a new input against this database to extract suitable examples which are then recombined in an analogical manner to determine the correct translation. Two variants of the corpus-based approach stand somewhat apart from the scenario suggested here. One, which we will not discuss at all in this paper, is the Connectionist or Neural network approach. So far, only a little work with not very promising results has been done in this area (see Waibel et al., 1991; McLean, 1992; Castaño et al. 1997; Koncar & Guthrie, 1997). The other major “new paradigm” is the purely statistical approach already mentioned, and usually identified with the IBM group’s Candide system (Brown et al. 1990, 1993), though the approach has also been taken up by a number of other researchers. The statistical approach is clearly example-based in that it depends on a bilingual corpus, but the matching and recombination stages that characterise EBMT are implemented in quite a different way in these approaches; more significant is that the important issues for the statistical approach are somewhat different, focusing, as one might expect, on the mathematical aspects of estimation of statistical parameters for the language models. Nevertheless, we will try to include these approaches in our overview.
منابع مشابه
Assessment of health risk attributed to heavy metals of settled dust in the elementary schools of Bandar Abbas
Background and Objective: Heavy metal (HM) pollution of settled dust on the interior surfaces of elementary schools may affect the health of young students; hence, the health risk of such pollution should be assessed. Therefore, the aims of this study were to measure the content of heavy metals in the settled dust in the indoor of elementary schools in Bandar Abbas and to assess the attributed ...
متن کاملEcological Risks Attributed to the Heavy Metals Pollution of Dust Settled on the Surface of Coastal Roads along the Persian Gulf
Introduction: Heavy metals (HMs) accumulated in the road dust may be transferred to the surrounding land and water bodies and pose ecological risks. Hence, such pollution should be considered, especially in coastal roads. This study focuses on the HMs pollution of dust settled on the surface of coastal roads along the Persian Gulf in Bandar Abbas city. Materials and Methods: In this study, ro...
متن کاملInvestigating levels of aflatoxin B1 exposure through the settled dust in the working environment in workers of the dry and wet household waste sorting of the recycling industry
Background & Aim: Aflatoxin B1 (AFB1) acts as a genotoxic, cytotoxic, and a potential hepatocarcinogen agent. Any contact with aflatoxins is a main threat to workers in the waste management industry. On the other hand, just a few studies have investigated occupational exposure to mycotoxins in the aforementioned industry. In this study, the exposure level of workers to dust and AFB1 has been in...
متن کاملAuditory-Verbal Music Play Therapy: An Integrated Approach (AVMPT)
Introduction: Hearing loss occurs when there is a problem with one or more parts of the ear or ears and causes children to have a delay in the language-learning process. Hearing loss affects children's lives and their development. Several approaches have been developed over recent decades to help hearing-impaired children develop language skills. Auditory-verbal therapy (AVT) is one such approa...
متن کاملA low-level jet in eastern Iran: a possible factor in dust events in the region
One of the world's major mineral dust source regions lies along the border between Iran and Afghanistan. In this studyit is hypothesized that a low-level jet may play in role in generating the intensity of this source region. The presence of alow-level jet east of the Seistan mountains is documented here for the first time. The jet exists mainly from May toSeptember and has a core at 850 mb. Ma...
متن کامل